DO NOT SHOW THIS PORTFOLIO IN CLASS
The Top 2000 is an annual Dutch marathon radio program, that plays the 2,000 most popular songs of all time determined by a public vote. The highest ranking songs do not vary much from year to year but there sometimes new songs. I am interested in what makes new popular songs suitable for top 50, meaning short term hits, and what makes new songs suited for the top 2000.
As for the corpus I will be using a playlist of the 50 newest songs from the top 2000 and comparing to the spotify Dutch top 50 playlist. When comparing the playlist on a track level basis, both the #1 will be used as I think they reflect the differences very well. The highest ranked song of the new top 2000 tracks is also the current #1, Rollercoaster by Danny Vera. The current number one song on the top 50 is Sea Shanty by Wellerman, remixed by 220 KID and Billen Ted. These song were chosen as they sound very different and fit the playlists well. Wellerman incorperates a recent trend from social media, a sea shanty and mixes it with Electronic Dance Music. This makes it a good song to represent top 50 which is more targeted on trends. Rollercoaster is more suited to be a long term popular song with its focus on vocal and instrumentals.
In this storyboard we will firstly gain insight in the correlations of the playlist based on spotify features such as loudness, acousticness and energy. The chromagrams of the number one songs and self similarity will be analysed. Furthermore we will look into the difficulty of categorizing the tempo of songs in the playlist. Lastly we will train and analyse a classifier which predicts the playlist of a song based on features.
Link to the 50 newest top 2000 songs Link to the top 50 as of February
Looking at the Valence, Energy and loudness of the playlist there are some differences.Songs from the top 50 playlist generally are more energetic and louder. The valence is also on higher on average for top 50 songs, but it is still quite spread out whereas few songs from the top 2000 playlist exceed a valence of 0.7.
The acousticness is also an interesting feature. I originally thought that this would be one of the biggest differences in the playlists as top 50 is in my mind has a lot of electronic music and the new top 2000 songs are more acoustic. In reality this is not the case and there are also more songs with lower acousticness in the new top 2000 songs than the top 50. The proportion of songs with lower acousticness is higher however in the top 50.
The new top 2000 songs are more in the minor key on average, and the top 50 songs have more major key on average. This correlates with the intuition, the top 50 songs feel happier.
The chromagram of rollercoaster shows that the song is written in the A pitch class. There is no overlap visible between the pitch classes. The melody is not clearly visible as the song is quite long with a short melody.
The chromagram of Wellerman Remix by 220 KID and Billen Ted shows that the song is written in the C pitch class. Not much more can be deduced from it however.
The differences between the songs are also visible on the self similarity matrices. Rollercoaster results in a much more uniform plot, while the Sea Shanty, has yellow lines at around 30 and 80 seconds. This correlates when you listen to the songs, Sea Shanty can be classified as an EDM song with “drops”. These drops result in a drastic change in timbre, rollercoaster is a pop song with more gradual timbre changes. Looking at the pitches, it is the same case the drops also result in big changes in pitch.
These matrices make it visible that a song is ‘trendier’ by showing EDM-like features.
The tempogram of rollercoaster is very unclear, the song primarily features a guitar melody and vocals. Tempograms seem to be more unclear on songs with these characteristics.
The tempogram of wellerman is very clear, the song has a very constant tempo of around 120 bpm. This is also the case for more songs in the top 50, the tempograms are clearer than most of the new songs from the top 2000.
A random forests classifier was trained to predict the playlist of a song, the mosaic on the left shows the performance of this classifier as a barplot. The matrix on the right shows the counts. The classifier was able to predict top 2000 with an accuracy of 35/(35+15) = 70% and top 50 with an accuracy of 32/(32+18) = 60%. These scores are not very good but I think they are correct in showing the feature importance.
The most important features from the random forest classifier are spotify features, the most significant one is the track length as top 2000 songs are generally longer than top 50 songs. The danceability is also an important feature with the top 50 having a higher danceability on average. The differences in valence also has some importance, which we also saw in the Spotify feature exploration. Interesting is that the acousticness is one of the lowest ranked features while it did show difference on the violin plot.
Plotting the features also makes it clear why the classifier ranked these features as the most important. Most of the outliers are top 2000 songs. The difference in the average length is clearly visible. The danceability is also higher and more clustered for the top 50 songs. The differences in valence is less clear by plotting it as the size.
The differences between the new songs from the top 2000 and the current top 50 are more subtle than expected. There are differences on a lot of metrics, but they are not very significant. The differences in spotify features do show clearly on the visualizations. The chromagrams did not provide interesting insight in the differences. The self similarity matrices provided a useful comparison per song, and looking at the structure of two songs you can see patterns that make it top 50 or top 2000. However I do think that these differences between the matrices are too subtle to be useful for prediction.
The random forest classifier provided interesting results, I think that using more top 50 playlists will result in better performance. The classifier also showed which features were interesting and this showed up on the plots of these features.
In conclusion, what makes a new song fit the top 2000 instead of the top 50 is a combination of the length of the track, the danceability, the valence, energy and whether it is a major key or minor key. These factors in combination will not give a definitive results but an indication.
Finally, maybe it is good that it is difficult to predict, considering that music is a form of artistic expression and unpredictability or creativity is also a part of art which makes art interesting.